Search CORE

12 research outputs found

Concentration inequalities for order statistics

Author: Boucheron Stephane
Thomas Maud
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

This note describes non-asymptotic variance and tail bounds for order statistics of samples of independent identically distributed random variables. Those bounds are checked to be asymptotically tight when the sampling distribution belongs to a maximum domain of attraction. If the sampling distribution has non-decreasing hazard rate (this includes the Gaussian distribution), we derive an exponential Efron-Stein inequality for order statistics: an inequality connecting the logarithmic moment generating function of centered order statistics with exponential moments of Efron-Stein (jackknife) estimates of variance. We use this general connection to derive variance and tail bounds for order statistics of Gaussian sample. Those bounds are not within the scope of the Tsirelson-Ibragimov-Sudakov Gaussian concentration inequality. Proofs are elementary and combine R\'enyi's representation of order statistics and the so-called entropy approach to concentration inequalities popularized by M. Ledoux.Comment: 13 page

arXiv.org e-Print Archive

Crossref

Hal-Diderot

About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes

Author: Gassiat Elisabeth
Ohannessian Mesrob I.
Stephane Boucheron
Publication venue
Publication date: 25/02/2014
Field of study

In this paper, we study the problem of lossless universal source coding for stationary memoryless sources on countably infinite alphabets. This task is generally not achievable without restricting the class of sources over which universality is desired. Building on our prior work, we propose natural families of sources characterized by a common dominating envelope. We particularly emphasize the notion of adaptivity, which is the ability to perform as well as an oracle knowing the envelope, without actually knowing it. This is closely related to the notion of hierarchical universal source coding, but with the important difference that families of envelope classes are not discretely indexed and not necessarily nested. Our contribution is to extend the classes of envelopes over which adaptive universal source coding is possible, namely by including max-stable (heavy-tailed) envelopes which are excellent models in many applications, such as natural language modeling. We derive a minimax lower bound on the redundancy of any code on such envelope classes, including an oracle that knows the envelope. We then propose a constructive code that does not use knowledge of the envelope. The code is computationally efficient and is structured to use an {E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the \textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower bound on the minimax redundancy within a factor logarithmic in the sequence length, and can be therefore qualified as a near-adaptive code over families of heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is even less, and the same code follows closely previous results that explicitly made the light-tailed assumption. Our technical results are founded on methods from regular variation theory and concentration of measure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Moment inequalities for functions of independent random variables

Author: Boucheron Stephane
Bousquet Olivier
Lugosi Gabor
Massart Pascal
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions [Boucheron, Lugosi and Massart Ann. Probab. 31 (2003) 1583-1614], and is based on a generalized tensorization inequality due to Latala and Oleszkiewicz [Lecture Notes in Math. 1745 (2000) 147-168]. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes and moment inequalities for Rademacher chaos and U-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrand's exponential inequality for Rademacher chaos of order 2 to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of Boolean polynomials which include, as special cases, subgraph counting problems in random graphs.Comment: Published at http://dx.doi.org/10.1214/009117904000000856 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

Adaptive compression against a countable alphabet

Author: Bontemps Dominique
Boucheron Stephane
Gassiat Elisabeth
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 01/01/2012
Field of study

International audienceThis paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and non-decreasing hazard rate. We prove that the auto-censuring (AC) code introduced by Bontemps (2011) is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy by Haussler and Opper (1997) and on a careful analysis of the performance of the AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of samples from discrete distributions with finite and non-decreasing hazard rate

Scientific Publications of the University of Toulouse II Le Mirail

Episciences.org

HAL-INSA Toulouse

Hal-Diderot

Apprentissage et calculs

Author: Boucheron Stephane
Publication venue
Publication date: 01/01/1988
Field of study

SIGLECNRS T Bordereau / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc

OpenGrey Repository

Model Selection and Error Estimation

Author: Bartlett Peter
Boucheron Stephane
Lugosi Gabor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped

CiteSeerX

Queensland University of Technology ePrints Archive

UPF Digital Repository

Model Selection and Error Estimation

Author: Bartlett Peter
Boucheron Stephane
Lugosi Gabor
Publication venue: Morgan Kauffman Publishers
Publication date: 08/12/2015
Field of study

The Australian National University

Pattern Coding Meets Censoring: (almost) Adaptive Coding on Countable Alphabets

Author: Ben-Hamou Anna
Boucheron Stephane
Gassiat Elisabeth
Publication venue: HAL CCSD
Publication date: 01/09/2016
Field of study

Adaptive coding faces the following problem: given a collection of source classes such that each class in the collection has non-trivial minimax redundancy rate, can we design a single code which is asymptotically minimax over each class in the collection? In particular, adaptive coding makes sense when there is no universal code on the union of classes in the collection. In this paper, we deal with classes of sources over an infinite alphabet, that are characterized by a dominating envelope. We provide asymptotic equivalents for the redundancy of envelope classes enjoying a regular variation property. We finally construct a computationally efficient online prefix code, which interleaves the encoding of the so-called pattern of the message and the encoding of the dictionary of discovered symbols. This code is shown to be adaptive, within a loglogn factor, over the collection of regularly varying envelope classes. The code is both simpler and less redundant than previously described contenders. In contrast with previous attempts, it also covers the full range of slowly varying envelope classes

Hal-Diderot